Route analytics queries by index setting#5432
Conversation
PR Reviewer Guide 🔍(Review updated until commit 28ac22a)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 28ac22a
Previous suggestionsSuggestions up to commit d504f77
|
Today `RestUnifiedQueryAction.isAnalyticsIndex` dispatches to the analytics engine when the source index name starts with `parquet_`. That's brittle — it conflates naming convention with storage type. An index created without the prefix but with pluggable dataformat enabled is silently sent to the Lucene path; an index named `parquet_foo` without the setting is mis-dispatched to analytics. Use the authoritative signal instead: the `index.pluggable.dataformat.enabled` setting on cluster-state metadata. This is the same setting integration tests (`CoordinatorReduceIT`, `CompositeCommitDeletionIT`, etc.) already use to create analytics-backed indices, and it's what `FieldStorageResolver` reads to decide field-level storage. Behavior: - `index.pluggable.dataformat.enabled=true` → analytics engine (DataFusion) - flag absent / false / index missing → Calcite→OpenSearch DSL path Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
d504f77 to
28ac22a
Compare
|
Persistent review updated to latest commit 28ac22a |
32d7358
into
opensearch-project:feature/mustang-ppl-integration
…search-project#5432) Today `RestUnifiedQueryAction.isAnalyticsIndex` dispatches to the analytics engine when the source index name starts with `parquet_`. That's brittle — it conflates naming convention with storage type. An index created without the prefix but with pluggable dataformat enabled is silently sent to the Lucene path; an index named `parquet_foo` without the setting is mis-dispatched to analytics. Use the authoritative signal instead: the `index.pluggable.dataformat.enabled` setting on cluster-state metadata. This is the same setting integration tests (`CoordinatorReduceIT`, `CompositeCommitDeletionIT`, etc.) already use to create analytics-backed indices, and it's what `FieldStorageResolver` reads to decide field-level storage. Behavior: - `index.pluggable.dataformat.enabled=true` → analytics engine (DataFusion) - flag absent / false / index missing → Calcite→OpenSearch DSL path Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Single squashed delivery of the long-running feature/mustang-ppl-integration branch into main, consolidating 22 feature-branch PRs plus the conflict-resolved merge of current main. Squashed because the feature branch's history includes commits with missing or mismatched Signed-off-by trailers that block DCO at this scope — the equivalent issue documented for the catch-up squashes (opensearch-project#5397). The feature branch f006e29 is retained for individual-commit lineage. ### What this delivers Analytics-engine PPL integration — a new execution path that routes Parquet-backed (non-Lucene) indices through an analytics engine while keeping Lucene-backed indices on the existing v2 / Calcite paths. Headline pieces: - Query routing (opensearch-project#5267) — PPL queries against Parquet-backed indices hand off to the analytics-engine execution path; Lucene-backed indices continue through the legacy path - Explain support (opensearch-project#5275) — EXPLAIN covers the analytics-engine path - Profiling + UnifiedQueryParser (opensearch-project#5285) — migrates PPL parsing to the unified parser and wires profiling metrics through the analytics path - extendedPlugins wiring (opensearch-project#5302) — analytics-engine attaches as an OpenSearch extension via SPI - SQL REST endpoint integration (opensearch-project#5317) — same analytics-route fork applied to the SQL transport, plus delegateToV2Engine extraction in RestSqlAction - Async QueryPlanExecutor (opensearch-project#5396) — async execution for analytics-engine plans + version bump to OpenSearch 3.7 - Optional dependency (opensearch-project#5403) — analytics-engine becomes an optional runtime dep so the SQL bundle is shippable without it - Index-setting-based routing (opensearch-project#5429, opensearch-project#5432) — replaces the earlier table-name-prefix heuristic with an authoritative index-setting check Supporting infrastructure: - Gradle wrapper bump to 9.4.1 (opensearch-project#5406) - Jar-hell exclusions for arrow-flight-rpc / httpcore5-h2 / httpcore5-reactive / httpclient5 (opensearch-project#5400, opensearch-project#5409) - IT plumbing: CalciteEvalCommandIT / CalciteFieldFormatCommandIT carried through the helper-managed index path (opensearch-project#5407, opensearch-project#5417); CalciteReplaceCommandIT column-order-agnostic (opensearch-project#5415); @ignore'd Calcite ITs dropped from CalciteNoPushdownIT (opensearch-project#5416) - plugins.calcite.enabled=true defaulted on the unified query path (opensearch-project#5413) - PPL_REX_MAX_MATCH_LIMIT bridged into UnifiedQueryContext (opensearch-project#5418) - Calcite tolerance fixes: array() default type (opensearch-project#5421), containsNestedAggregator flat-leaf schemas (opensearch-project#5423) - Sandbox deps switched to analytics-api JDK 21 surface (opensearch-project#5426) ### Feature-branch commits squashed (22) opensearch-project#5432, opensearch-project#5429, opensearch-project#5426, opensearch-project#5423, opensearch-project#5421, opensearch-project#5418, opensearch-project#5403, opensearch-project#5417, opensearch-project#5415, opensearch-project#5416, opensearch-project#5413, opensearch-project#5407, opensearch-project#5409, opensearch-project#5406, opensearch-project#5400, opensearch-project#5396, opensearch-project#5317, opensearch-project#5302, opensearch-project#5285, opensearch-project#5275, opensearch-project#5267, opensearch-project#5397, opensearch-project#5286 ### Main commits absorbed via the merge (54) Brings the branch up to current upstream/main (54 commits since the last catch-up at opensearch-project#5397, divergence point 513e1b2). Highlights: opensearch-project#5419, opensearch-project#5408, opensearch-project#5414, opensearch-project#5399, opensearch-project#5394, opensearch-project#5361, opensearch-project#5360, opensearch-project#5240, opensearch-project#5266, opensearch-project#5278, plus 44 others (bugfixes, doc updates, infra). ### Conflict resolutions (7) Resolved during the merge of main into the feature branch. Resolution kept the feature branch's analytics-engine-path semantics where main's changes would have regressed them. - api/.../UnifiedQueryContext.java Blank-line-only conflict; took main's tighter formatting. - core/.../executor/QueryService.java Kept feature's CalciteClassLoaderHelper.withCalciteClassLoader(...) wrapping (required for analytics-engine classloader isolation) and the matching import. - integ-test/build.gradle Kept feature's detailed root-cause comment on the Gradle 9.4.1 TestEventReporterAsListener workaround; kept ASCII ordering of JSONRequestIT / JoinIT and SQLFunctionsIT / ShowIT / SourceFieldIT entries. - integ-test/.../CalciteEvalCommandIT.java Kept feature's if (!TestUtils.isIndexExist(...)) idempotency guards on test_eval and test_eval_agent setup (needed for the helper-managed index analytics-engine compatibility run). - legacy/.../RestSqlAction.java Kept feature's delegateToV2Engine(...) (extracted from the analytics-engine routing path). Both sides added handleException / getRestStatus / getRawErrorCode; removed the duplicate set git produced. - plugin/.../SQLPlugin.java Took the union of imports: ExecutionEngine + ExecutionEngine.ExplainResponse + QueryType. - plugin/.../transport/TransportPPLQueryAction.java Combined main's OpenSearchPluginModule(extensionsHolder.engines()) and feature's local pluginSettings / pluginSettingsRef wiring. EngineExtensionsHolder.java is a new file from main (opensearch-project#5298) preserved as-is. ### Compatibility / opt-in The analytics-engine path is gated by the extendedPlugins extension being installed (opensearch-project#5403 makes the dep optional). Clusters without analytics-engine installed see no behavior change. Clusters with analytics-engine installed route only Parquet-backed indices through the new path (opensearch-project#5429 — by index setting). ### Verification - ./gradlew :api:compileJava :core:compileJava :legacy:compileJava :opensearch-sql-plugin:compileJava :integ-test:compileTestJava passes locally Signed-off-by: Kai Huang <ahkcs@amazon.com> Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com>
* Land analytics-engine PPL integration into main Single squashed delivery of the long-running feature/mustang-ppl-integration branch into main, consolidating 22 feature-branch PRs plus the conflict-resolved merge of current main. Squashed because the feature branch's history includes commits with missing or mismatched Signed-off-by trailers that block DCO at this scope — the equivalent issue documented for the catch-up squashes (#5397). The feature branch f006e29 is retained for individual-commit lineage. ### What this delivers Analytics-engine PPL integration — a new execution path that routes Parquet-backed (non-Lucene) indices through an analytics engine while keeping Lucene-backed indices on the existing v2 / Calcite paths. Headline pieces: - Query routing (#5267) — PPL queries against Parquet-backed indices hand off to the analytics-engine execution path; Lucene-backed indices continue through the legacy path - Explain support (#5275) — EXPLAIN covers the analytics-engine path - Profiling + UnifiedQueryParser (#5285) — migrates PPL parsing to the unified parser and wires profiling metrics through the analytics path - extendedPlugins wiring (#5302) — analytics-engine attaches as an OpenSearch extension via SPI - SQL REST endpoint integration (#5317) — same analytics-route fork applied to the SQL transport, plus delegateToV2Engine extraction in RestSqlAction - Async QueryPlanExecutor (#5396) — async execution for analytics-engine plans + version bump to OpenSearch 3.7 - Optional dependency (#5403) — analytics-engine becomes an optional runtime dep so the SQL bundle is shippable without it - Index-setting-based routing (#5429, #5432) — replaces the earlier table-name-prefix heuristic with an authoritative index-setting check Supporting infrastructure: - Gradle wrapper bump to 9.4.1 (#5406) - Jar-hell exclusions for arrow-flight-rpc / httpcore5-h2 / httpcore5-reactive / httpclient5 (#5400, #5409) - IT plumbing: CalciteEvalCommandIT / CalciteFieldFormatCommandIT carried through the helper-managed index path (#5407, #5417); CalciteReplaceCommandIT column-order-agnostic (#5415); @ignore'd Calcite ITs dropped from CalciteNoPushdownIT (#5416) - plugins.calcite.enabled=true defaulted on the unified query path (#5413) - PPL_REX_MAX_MATCH_LIMIT bridged into UnifiedQueryContext (#5418) - Calcite tolerance fixes: array() default type (#5421), containsNestedAggregator flat-leaf schemas (#5423) - Sandbox deps switched to analytics-api JDK 21 surface (#5426) ### Feature-branch commits squashed (22) #5432, #5429, #5426, #5423, #5421, #5418, #5403, #5417, #5415, #5416, #5413, #5407, #5409, #5406, #5400, #5396, #5317, #5302, #5285, #5275, #5267, #5397, #5286 ### Main commits absorbed via the merge (54) Brings the branch up to current upstream/main (54 commits since the last catch-up at #5397, divergence point 513e1b2). Highlights: #5419, #5408, #5414, #5399, #5394, #5361, #5360, #5240, #5266, #5278, plus 44 others (bugfixes, doc updates, infra). ### Conflict resolutions (7) Resolved during the merge of main into the feature branch. Resolution kept the feature branch's analytics-engine-path semantics where main's changes would have regressed them. - api/.../UnifiedQueryContext.java Blank-line-only conflict; took main's tighter formatting. - core/.../executor/QueryService.java Kept feature's CalciteClassLoaderHelper.withCalciteClassLoader(...) wrapping (required for analytics-engine classloader isolation) and the matching import. - integ-test/build.gradle Kept feature's detailed root-cause comment on the Gradle 9.4.1 TestEventReporterAsListener workaround; kept ASCII ordering of JSONRequestIT / JoinIT and SQLFunctionsIT / ShowIT / SourceFieldIT entries. - integ-test/.../CalciteEvalCommandIT.java Kept feature's if (!TestUtils.isIndexExist(...)) idempotency guards on test_eval and test_eval_agent setup (needed for the helper-managed index analytics-engine compatibility run). - legacy/.../RestSqlAction.java Kept feature's delegateToV2Engine(...) (extracted from the analytics-engine routing path). Both sides added handleException / getRestStatus / getRawErrorCode; removed the duplicate set git produced. - plugin/.../SQLPlugin.java Took the union of imports: ExecutionEngine + ExecutionEngine.ExplainResponse + QueryType. - plugin/.../transport/TransportPPLQueryAction.java Combined main's OpenSearchPluginModule(extensionsHolder.engines()) and feature's local pluginSettings / pluginSettingsRef wiring. EngineExtensionsHolder.java is a new file from main (#5298) preserved as-is. ### Compatibility / opt-in The analytics-engine path is gated by the extendedPlugins extension being installed (#5403 makes the dep optional). Clusters without analytics-engine installed see no behavior change. Clusters with analytics-engine installed route only Parquet-backed indices through the new path (#5429 — by index setting). ### Verification - ./gradlew :api:compileJava :core:compileJava :legacy:compileJava :opensearch-sql-plugin:compileJava :integ-test:compileTestJava passes locally Signed-off-by: Kai Huang <ahkcs@amazon.com> Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com> * Address @penghuo: revert stray blank line in doctest/build.gradle After 'apply plugin: opensearch.testclusters', one blank line is enough — restoring the single-blank spacing to match upstream/main. Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com> Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com>
Adds a small, opt-in test infrastructure slice so the PPL integration
test suite can run end-to-end against the analytics-engine backend
without per-test rewiring.
`-Dtests.analytics.parquet_indices=true` makes `TestUtils.createIndexByRestClient`
back every test-created index with single-shard composite/parquet storage:
index.pluggable.dataformat.enabled = true
index.pluggable.dataformat = "composite"
index.composite.primary_data_format = "parquet"
`RestUnifiedQueryAction.isAnalyticsIndex` (post-opensearch-project#5432) reads these settings
and routes any query against such indices to the analytics-engine planner
(DataFusion). No additional cluster setting or routing override required —
the production routing logic is the single source of truth.
Also adds `PPLIntegTestCase.isAnalyticsParquetIndicesEnabled()` as a
per-test predicate so individual tests can branch their assertions on
engine semantics (DataFusion follows different ordering and null-bucket
semantics than the legacy V2 and Calcite-DSL paths).
Bulk loads on parquet-backed indices use `refresh=true` because
`analytics-backend-lucene`'s `LuceneCommitter.getSafeCommitInfo` is a
`TODO:: with index deleter` stub that hangs `refresh=wait_for` until the
test framework request timeout (~60s).
`integ-test/build.gradle` forwards the property to `:integTestRemote` so
the gradle command line is the single knob.
Default behavior is unchanged — with the flag unset, every test-created
index is Lucene-backed and every IT runs through the existing V2 /
Calcite path exactly as before.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Summary
index.pluggable.dataformat.enabled=trueANDindex.pluggable.dataformat=compositebefore routing to the analytics pathcomposite(e.g.lucene), fall through to the standard Calcite→DSL pathpluginSettingstoRestUnifiedQueryActionto support cluster-level setting overrides (rex max_match limit)Test plan
RestUnifiedQueryActionTest)